Copying and
storing Web pages is vital to the Internet's survival -- but is it legal?
One night, while browsing the World Wide Web through
his America Online account, a colleague of mine came across a site that linked
to our firm's Web site (www.cooley.com). In a moment of curiosity, he clicked
on the link to retrieve the firm's home page. To his horror, the page he
received was a long-outdated version that did not reflect our firm's recent
investments to improve the site. The attorney logged out wondering why the site
that linked to our firm's Web site had chosen to link to an outdated page.
My
colleague, and our firm, were the victims of "caching" by America
Online.
Okay,
but what is "caching"?
Caching
is a nebulous term, tautologically defined as the process of storing something
in a place of storage. On the Internet, caching occurs at multiple levels.
First,
many browsers "locally" cache, or store recently visited Web pages in
the computer's RAM -- random access memory. For example, a person running
Netscape Navigator who selects the "back" button will, most times,
retrieve a page from RAM instead of receiving a "fresh" copy of that
page downloaded from the actual Web site.
Caching
also occurs at the server level -- termed "proxy" caching. The most
obvious users of proxy caching are online services such as AOL, Compuserve and
Prodigy, that store the most frequently requested pages on their own computers.
Then, when a user requests a page that has been cached, the online service will
deliver a copy from its own computers' memory -- not from the Web site in
question. This is exactly what happened to our intrepid attorney.
TO
CACHE. . .
On
the Internet, caching speeds user access to Web pages and reduces demands on a
limited infrastructure. The following diagram indicates the typical data flow
of a Web page requested by a user:
[Subject Web site]
[Subject Web site's connection to the Internet]
[Internet]
[Requester's connection to the Internet]
[Requester's computer]
Several
of these levels are subject to congestion and therefore can benefit from
caching.
First,
the Web site's server may be overloaded and therefore unable to process
requests. Second, the Web site may have an inadequate connection to the
Internet or may be using an intermediate access provider that is subject to its
own congestion. Third, the Internet is subject to congestion, as the data is
broken into packets and sent via potentially congested pipelines to different
computers that may be backlogged with other data to process. Fourth, the
requester's access provider may be congested. Finally, the requester -- the
person accessing the Web -- may have an insufficient connection to the Internet
or be running an underpowered computer.
If
some of these steps are bypassed, the benefits are obvious: With fewer stops at
potentially congested sites, the data is delivered faster. Conversely, if every
request for every Web page was filled by going through the full process
described above rather than from a cache, the increased data flow could easily
overwhelm the already-teetering infrastructure of the Internet, making it a
victim of its own success. Not only would such demand interfere with the smooth
operation of the Web, but it would affect the flow of all data on the Internet
-- e-mail, net telephony, FTP, and so on.
.
. .OR NOT TO CACHE
While
caching is currently essential to the successful functioning of the Web, caching
(and in particular proxy caching) has created a number of problems for both
users and publishers of content.
The
first major problem, as the opening example showed, is that caching interferes
with the ability of Web sites to control what data is delivered to people
requesting a page. In our law firm's case, AOL users could believe,
incorrectly, that our firm is not investing resources to maintain our Web
presence -- and in the profession of law, where image is critical, this
perception could harm the firm's practice.
However,
Web page owners' lack of control over the caching process can have even more
insidious results. For example, imagine that a Web publisher discovers that
some information she has posted is harmful in some way -- perhaps the information
is inaccurate or infringes someone's copyright. Even if the publisher discovers
this problem and corrects it on her site, the harmful information will be
disseminated to end users until all caches containing the old version of the
page are refreshed. Furthermore, if users do not know they are receiving pages
from a cache, they may incorrectly assume they are getting up-to-date
information. If someone is seeking real-time stock market quotes or does not
realize that an analysis of the law has been mooted by subsequent developments,
the consequences could be painful or expensive.
Also,
consider Web sites that sell advertising on a time-sensitive basis -- such as a
banner ad slot between 6:30 and 7 p.m. Unless all proxy caches are refreshed
precisely at the beginning and end of the period for which advertising is sold,
such Web sites cannot successfully implement this plan -- either the ad will
get less time (possibly no time, if the cache does not refresh at all during
the ad's time slot) or more time (for example, if the cache refreshes at 6:30
p.m. and then does not refresh for another 48 hours) than was paid for.
The
second major problem is that caching interferes with Web sites' analysis of
their users. This problem is most pointed for Web sites that charge advertisers
based on the amount of data delivered to users.
For
example, most major advertising-driven Web sites, such as HotWired, Pathfinder
and Netscape, charge based on the number of times a banner advertisement is
displayed to users (often called "page impressions"). Since a cached
page is downloaded from the cache and not the actual owner's Web site, the Web
site owner does not know whether or how often a given page was viewed from the
cache, and cannot charge their advertisers for such page impressions.
Predictably, this makes advertising-driven Web site owners unhappy, since
caching means lost revenues. In fact, page impression data is so valuable to
advertising-driven Web sites that at least one online service markets to Web
site owners data about the number of page impressions delivered from its cache.
While
page impression data is important to Web site owners, Web site owners can
extract value by understanding user activity in other ways. Indeed, a whole
science has developed to analyze "server logs," which record the
activities of Web site users. Again, when this data flows to the online service
and not the Web site, the Web site is unable to realize the value of its
relationship with users.
Finally,
the caching entity itself faces some peril from proxy caching. Under some
paradigms of online law (which are very much in flux right now), a caching
entity could possibly be liable for claims of defamation, invasion of privacy
and other torts faced by publishers/republishers. The case law also suggests
that a caching entity could be liable for copyright infringement, both of the
Web sites being cached, and of third parties if the cached pages have infringed
others' copyrights. Last but not least, the proxy cache could contain
pornography or obscene materials, which creates the possibility of being
liable, or at least harassed by zealous prosecutors, for such material.
THE
LAW OF THE CACHE
Caching
implicates a number of the exclusive rights of copyright holders under 17
U.S.C. §106, including (i) reproduction (by making an extra copy into RAM or
possibly a hard drive) and, in the case of proxy caching, (ii) distribution,
(iii) public display, and possibly (iv) public performance and (v) digital
performance. Despite the sometimes illogical results reached by treating a copy
in RAM as copyright infringement, this result has been consistently reached by
the courts.
However,
caching may be "fair use" under 17 U.S.C. §107, providing a defense
to an infringement action. The multi-factor fair-use test considers the purpose
of the use; whether the infringed work was published or unpublished and was
fact or fiction; the amount and substantiality of the portion taken; and the
effect of the infringement on the market for the work. Given its multi-factor analysis,
litigation over whether a use was fair tends to be cumbersome. The following is
a preliminary breakdown of how the factors might play out in a suit:
·
Purpose of use. Proxy
caches normally operate to benefit customers and reduce investment in infrastructure.
Thus, such caching has a commercial purpose. Although the facts will differ in
each case, this factor is more likely to weigh against fair use.
·
Fact/fiction;
published/unpublished. All material available on the Internet is by definition
"published" for purposes of copyright. Whether the cached work is
fact or fiction will depend on the specific circumstances.
·
Amount and
substantiality of portion taken. Caches almost invariably make a copy of entire
Web pages, which in turn may have a number of elements -- graphics, for example
-- that are subject to their own copyright. In these situations, the amount
taken will be 100 percent of copyrighted works, which usually (but not always)
precludes a finding of fair use.
·
Effect on the market.
Under copyright jurisprudence, this is the most important factor. While it is
difficult to define the "market" for Web pages that are made
available for free, caching causes Web sites to undercount page impressions --
information of value -- so caching could be deemed to interfere with the market
for page impression data. On the other hand, for pages that do not sell
advertising to third parties (www.cooley.com, for example), it is very
difficult to define what "market" is interfered with by caching.
Because it is difficult to know how a court will analyze this factor, it is
equally difficult to know if caching will be deemed fair use.
The
fair-use factors, perhaps predictably, lead to no definitive answers. Thus,
relying on fair use to justify caching, particularly proxy caching, is a
precarious position under existing copyright law. In fact, in one recent
transaction I negotiated with a major online service, the service added a
"license to cache" my client's Web site to an agreement that
otherwise had nothing to do with caching -- presumably to remove any doubt.
Although
proxy caching appears to be more problematic than local caching, it is not
clear that local caching will be free from suit. For example, such a suit might
arise in the case of a large company where the cumulative effects of local
caching by many Web browsers (perhaps combined with statutory damages and
attorneys fees) are significant.
One
last point under copyright law: Some have argued that caching is permitted
under an "implied license" that Web site owners grant simply by
making their content available over the Internet. This argument makes sense in
that if Web site owners want users to browse -- that is, load pages into RAM,
and thereby make a copy -- they must grant an implied license to make that
copy. However, as applied to caching, the argument for implied license is
predicated on the existence of technology that permits Web sites to control
caching -- and therefore, any Web site that fails to use such technology grants
an implied license to cache. However, there is no other place under existing
copyright law where copyright holders' failure to use technology to reduce
infringement creates an implied license to infringe. Indeed, placing the burden
on copyright holders would be inconsistent with the general legislative trend
toward increased protection for copyright holders.
CACHE
ME IF YOU CAN
Some
existing technologies allow Web sites to control the caching process. First,
Web sites can create "dynamic pages" that are displayed to users only
after the user initiates a server-resident program (a "cgi script").
While this solves the problem, cgi scripts are currently somewhat expensive to
program. Second, Web sites can code their pages with "expiry"
information, which tells the proxy cache when to refresh the cached page.
However, there are no current standards for recognizing expiry information, so
Web sites that properly code their pages may find that some proxy caches ignore
their instructions. Furthermore, because of the danger of persistent inaccurate
information, Web sites have incentives to make the expiration time short or to
code the page so that it will not be cached at all, which then reduces the
benefits of caching for everyone.
As
for copyright law, any non-technological solution to permit caching will need
to be legislative. Prior situations where fair-use doctrines were stretched
have been clarified by amending the copyright law. Examples include copying
into RAM done by computers during normal operation (17 U.S.C. §117), and the "ephemeral
recordings" made by broadcasters (17 U.S.C. §112), both now protected uses
under copyright law. Without a similar legislative response, confusion over
caching will reign.
Whether
or not caching should be permitted under copyright law ultimately depends on a
policy determination about the operation of the Internet. Treating caching as
an infringement will increase data flows over the Internet and likely overwhelm
its existing infrastructure, which in turn will require enormous investments in
infrastructure expansion or a sure-to-be-unpopular congestion/metering pricing
scheme. On the other hand, if caching is legal, then standards must be
developed to allow Web sites to be able to control the way their information is
published online.
Because
caching is fundamental to the Internet, a non-technological solution to the
problem will require Solomon-like wisdom. In the absence of such wisdom, we can
expect to hear plenty more about caching in the near future. text.
Eric Schlachter is an attorney practicing cyberspace law with Cooley Godward Castro Huddleson & Tatum. He is also an adjunct professor of cyberspace law at Santa Clara University School of Law.